AITopics | figure 12

cd706106802dbea2068efd7031c3b420-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 23:22:52 GMT

activation function, approximation, hypothesis, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Add feedback

Pseudo codes

Neural Information Processing SystemsFeb-11-2026, 22:22:58 GMT

Note that we don't validate the inner-loop'sλ at every outer-loop iteration, but keep changing it on-the-fly at each validation cycle.

artificial intelligence, distill-cf, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

A Data Collection and Details about the

Neural Information Processing SystemsFeb-10-2026, 11:13:31 GMT

We collected about 30 million text-image pairs from multiple channels, and built a 2.5TB new dataset (after tokenization, the size becomes about 250GB). The sources of data are basically classified into the following categories: (1) Professional image websites (both English and Chinese). The images in the websites are usually with captions. We have already introduced tokenizers in section 2.2, and here are some details. Colored grids are all the tokens attended to by the token marked "O".

caption, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

4fc81f4cd2715d995018e0799262176b-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 21:59:24 GMT

Two other important techniques are mixed precision training [36] and in-place activated BatchNorm [53]. Mixed precision training involves training using both 32-bit and 16-bit IEEE floating point numbers depending onthenumerical sensitivityofdifferent layers [36].

artificial intelligence, machine learning, xij, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

PredictingTrainingTimeWithoutTraining SupplementaryMaterial

Neural Information Processing SystemsFeb-8-2026, 06:05:19 GMT

In both cases we observe that the predicted curve is reasonably close to the actual curve, more so at the beginning of the training (which is expected, sincethelinearapproximation ismorelikelytohold). Point-wise similarity of predicted and observed loss curve. Up to now we focused on prediction error rates (see e.g. We started defining training time as the first time the (smoothed) loss is belowagiventhreshold(whichwethennormalizedw.r.t. In Section 4we suggest that, in the case of MSE loss, itispossible to predict the training time on alargedataset using asubset ofthesamples. However,sinceourtraining time definition measures the time to reach the asymptotic value (which is what is most useful in practice) rather than the time reach an absolute threshold, this does not affect the accuracy of the prediction(seeAppendixC).

artificial intelligence, machine learning, training time, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

24d6d158531508115e628188e2697f76-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-7-2026, 22:35:10 GMT

Next, we 6.1.2. Figure connected.

artificial intelligence, decay, dropout, (18 more...)

Neural Information Processing Systems

Country: Europe (0.05)

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

A Proof of Lemma 1 According to the second condition in (8), we have q (x) = q (x

Neural Information Processing SystemsAug-19-2025, 00:12:57 GMT

Therefore, it fails to control the false positive rate. Figure 10: Distribution of naive p -value when the null hypothesis is true. Figure 11: Distribution of selective p -value when the null hypothesis is true. Figure 12: Uniform QQ-plot of the pivot. In the above example, we used 3 cuts (pieces) to approximate the function. Figure 13, we show that # encountered intervals still linearly increase in practice. Figure 13: Demonstration of # encountered and # truncation intervals when increasing # cuts (pieces).

hypothesis, lemma 1, second condition, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Add feedback